Learning Multiple Models for Reward Maximization
نویسندگان
چکیده
We present an approach to reward maximiza-tion in a non-stationary mobile robot environment. The approach works within the realistic constraints of limited local sensing and limited a priori knowledge of the environment. It is based on the use of augmented Markov models (AMMs), a general modeling tool we have developed. AMMs are essentially Markov chains having additional statistics associated with states and state transitions. We have developed an algorithm that constructs AMMs on-line and in real-time with little computational and space overhead, making it practical to learn multiple models of the interaction dynamics between a robot and its environment during the execution of a task. For the purposes of reward maximiza-tion in a non-stationary environment, these models monitor events at increasing intervals of time and provide statistics used to discard redundant or outdated information while reducing the probability of conforming to noise. We have successfully implemented this approach with a physical mobile robot performing a mine collection task. In the context of this task, we rst present experimental results validating our reward max-imization criterion in a stationary environment. We then incorporate our algorithm for redundant/outdated information reduction using multiple models and apply the approach to a non-stationary environment with an abrupt change. Finally, we apply the technique to a simulated version of the task with a gradually shifting environment.
منابع مشابه
Inverse Reinforce Learning with Nonparametric Behavior Clustering
Inverse Reinforcement Learning (IRL) is the task of learning a single reward function given a Markov Decision Process (MDP) without defining the reward function, and a set of demonstrations generated by humans/experts. However, in practice, it may be unreasonable to assume that human behaviors can be explained by one reward function since they may be inherently inconsistent. Also, demonstration...
متن کاملThe Cyber Rodent Project: Exploration of Adaptive Mechanisms for Self-Preservation and Self-Reproduction
The aim of the Cyber Rodent project is to understand the origins of our reward and affective systems by building artificial agents that share the same intrinsic constraints as natural agents: Self-preservation and self-reproduction. A Cyber Rodent is a robot that can search for and recharge from battery packs on the floor and copy its programs to a nearby agent through its infrared communicatio...
متن کاملSelection Criteria for Neuromanifolds of Stochastic Dynamics
We present ways of defining neuromanifolds – models of stochastic matrices – that are compatible with the maximization of an objective function such as the expected reward in reinforcement learning theory. Our approach is based on information geometry and aims at the reduction of model parameters with the hope to improve gradient learning processes.
متن کاملA Biologically Plausible 3-factor Learning Rule for Expectation Maximization in Reinforcement Learning and Decision Making
One of the most frequent problems in both decision making and reinforcement learning (RL) is expectation maximization involving functionals such as reward or utility. Generally, these problems consist of computing the optimal solution of a density function. Instead of trying to find this exact solution, a common approach is to approximate it through a learning process. In this work we propose a...
متن کاملInverse Reinforcement Learning with Locally Consistent Reward Functions
Existing inverse reinforcement learning (IRL) algorithms have assumed each expert’s demonstrated trajectory to be produced by only a single reward function. This paper presents a novel generalization of the IRL problem that allows each trajectory to be generated by multiple locally consistent reward functions, hence catering to more realistic and complex experts’ behaviors. Solving our generali...
متن کامل